Ngram-Based Statistical Machine Translation Enhanced with Multiple Weighted Reordering Hypotheses
نویسندگان
چکیده
This paper describes the 2007 Ngram-based statistical machine translation system developed at the TALP Research Center of the UPC (Universitat Politècnica de Catalunya) in Barcelona. Emphasis is put on improvements and extensions of the previous years system, being highlighted and empirically compared. Mainly, these include a novel word ordering strategy based on: (1) statistically monotonizing the training source corpus and (2) a novel reordering approach based on weighted reordering graphs. In addition, this system introduces a target language model based on statistical classes, a feature for out-of-domain units and an improved optimization procedure. The paper provides details of this system participation in the ACL 2007 SECOND WORKSHOP ON STATISTICAL MACHINE TRANSLATION. Results on three pairs of languages are reported, namely from Spanish, French and German into English (and the other way round) for both the in-domain and out-of-domain tasks.
منابع مشابه
Using Linear Interpolation and Weighted Reordering Hypotheses in the Moses System
This paper proposes to introduce a novel reordering model in the open-source Moses toolkit. The main idea is to provide weighted reordering hypotheses to the SMT decoder. These hypotheses are built using a first-step Ngram-based SMT translation from a source language into a third representation that is called reordered source language. Each hypothesis has its own weight provided by the Ngram-ba...
متن کاملAn Ngram-based reordering model
This paper describes in detail a novel approach to the reordering challenge in statistical machine translation (SMT). This Ngram-based reordering (NbR) approach uses the powerful techniques of SMT systems to generate a weighted reordering graph. Thus, statistical criteria reordering constraints are supplied to an SMT system, and this allows an extension to the SMT decoding search. The NbR appro...
متن کاملComputing multiple weighted reordering hypotheses for a statistical machine translation phrase-based systems
Reordering is one source of error in statistical machine translation (SMT). This paper extends the study of the statistical machine reordering (SMR) approach, which uses the powerful techniques of the SMT systems to solve reordering problems. Here, the novelties yield in: (1) using the SMR approach in a SMT phrase-based system, (2) adding a feature function in the SMR step, and (3) analyzing th...
متن کاملN-gram-based SMT System Enhanced with Reordering Patterns
This work presents translation results for the three data sets made available in the shared task “Exploiting Parallel Texts for Statistical Machine Translation” of the HLT-NAACL 2006 Workshop on Statistical Machine Translation. All results presented were generated by using the Ngram-based statistical machine translation system which has been enhanced from the last year’s evaluation with a tagge...
متن کاملReordered Search and Tuple Unfolding for Ngram-based SMT
In Statistical Machine Translation, the use of reordering for certain language pairs can produce a significant improvement on translation accuracy. However, the search problem is shown to be NP-hard when arbitrary reorderings are allowed. This paper addresses the question of reordering for an Ngram-based SMT approach following two complementary strategies, namely reordered search and tuple unfo...
متن کامل